Redundancy-weighting for better inference of protein structural features
نویسندگان
چکیده
MOTIVATION Structural knowledge, extracted from the Protein Data Bank (PDB), underlies numerous potential functions and prediction methods. The PDB, however, is highly biased: many proteins have more than one entry, while entire protein families are represented by a single structure, or even not at all. The standard solution to this problem is to limit the studies to non-redundant subsets of the PDB. While alleviating biases, this solution hides the many-to-many relations between sequences and structures. That is, non-redundant datasets conceal the diversity of sequences that share the same fold and the existence of multiple conformations for the same protein. A particularly disturbing aspect of non-redundant subsets is that they hardly benefit from the rapid pace of protein structure determination, as most newly solved structures fall within existing families. RESULTS In this study we explore the concept of redundancy-weighted datasets, originally suggested by Miyazawa and Jernigan. Redundancy-weighted datasets include all available structures and associate them (or features thereof) with weights that are inversely proportional to the number of their homologs. Here, we provide the first systematic comparison of redundancy-weighted datasets with non-redundant ones. We test three weighting schemes and show that the distributions of structural features that they produce are smoother (having higher entropy) compared with the distributions inferred from non-redundant datasets. We further show that these smoothed distributions are both more robust and more correct than their non-redundant counterparts. We suggest that the better distributions, inferred using redundancy-weighting, may improve the accuracy of knowledge-based potentials and increase the power of protein structure prediction methods. Consequently, they may enhance model-driven molecular biology.
منابع مشابه
Comparing various attributes of prolactin hormones in different species: application of bioinformatics tools
Prolactin is mainly secreted by the anterior pituitary and is able to stimulate mammary gland development and lactation in mammalians. Although prolactins share a common ancestral gene encoding, they show species specific characteristics and their efficiency may be different in various mammals. The importance of protein structures of all sequences of this hormone have been studied by various bi...
متن کاملA Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection
K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...
متن کاملA Real Time Adaptive Multiresolution Adaptive Wiener Filter Based On Adaptive Neuro-Fuzzy Inference System And Fuzzy evaluation
In this paper, a real-time denoising filter based on modelling of stable hybrid models is presented. Thehybrid models are composed of the shearlet filter and the adaptive Wiener filter in different forms.The optimization of various models is accomplished by the genetic algorithm. Next, regarding thesignificant relationship between Optimal models and input images, changing the structure of Optim...
متن کاملQuantitative Safety and Health Assessment Based on Fuzzy Inference and AHP at Preliminary Design Stage
Quantitative assessment is the most important means to identify hazard potential and manage risk for an industrial process. The implement of quantitative assessment in the early stage will help to develop inherently safer process, eliminating the hazard and reduce the possibility of accidental chain events and the magnitude of consequences. In this paper, after reviewing the presently available...
متن کاملToward better feature weighting algorithms: a focus on Relief
Feature weighting algorithms try to solve a problem of great importance nowadays in machine learning: The search of a relevance measure for the features of a given domain. This relevance is primarily used for feature selection as feature weighting can be seen as a generalization of it, but it is also useful to better understand a problem’s domain or to guide an inductor in its learning process....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 30 16 شماره
صفحات -
تاریخ انتشار 2014